Search CORE

11 research outputs found

Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale

Author: Hani Z. Girgis
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

Abstract Background Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs) and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF) binding sites (TFBSs). Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed. Results We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was 21-75% more precise than a related CRM predictor. The sensitivity of the system to locate known human heart enhancers reached up to 83%. CrmMiner precision reached 82% while mining for CRMs specific to the human CD4+ T cells. On several data sets, the system achieved 99% specificity. Conclusion These results suggest that CrmMiner predictions are accurate and likely to be tissue-specific CRMs. We expect that the predicted tissue-specific CRMs and the regulatory signatures broaden our knowledge of gene transcription regulation.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

JavaDD: a Declarative Debugger for Java

Author: Bharat Jayaraman
Hani Z. Girgis
Publication venue
Publication date
Field of study

This paper presents a declarative approach to the debugging of object-oriented programs and illustrates the methodology through an extension of a novel interactive visualization system for Java developed in our previous research. Unlike traditional “procedural ” debugging, we use the term “declarative debugging ” to refer to a flexible set of queries on individual execution states and also over the entire history of execution (or portion of the history). Examples include queries to find all values assigned to a variable over its life-time; which variable has a certain value; the calling sequence that results in a certain outcome; whether a certain statement was executed; etc. These queries were arrived at by a systematic study of errors in objectoriented programs in our previous research. Our proposed system, JavaDD, maintains the execution history as a relational database of salient events, such as method call/return, thread start/end, variable assignment, etc. An important property of our approach is that these queries can be posed interactively (at any step of execution), and there is no need to develop a compiler to instrument the source code, as in related research projects. Furthermore, we also sketch a visual interface so that both queries and answers can be composed using inituitive object and sequence diagrams. We believe such an approach is a significant contribution to the art of program debugging. We present the architecture of JavaDD, a detailed catalog of our queries and their translation, and several examples illustrating the approach. We also compare our approach related research efforts in the area of query-based analysis of object-oriented programs

CiteSeerX

HebbPlot: an intelligent tool for learning and visualizing chromatin mark signatures

Author: Alfredo Velasco
Hani Z. Girgis
Zachary E. Reyes
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2018
Field of study

Abstract Background Histone modifications play important roles in gene regulation, heredity, imprinting, and many human diseases. The histone code is complex and consists of more than 100 marks. Therefore, biologists need computational tools to characterize general signatures representing the distributions of tens of chromatin marks around thousands of regions. Results To this end, we developed a software tool, HebbPlot, which utilizes a Hebbian neural network in learning a general chromatin signature from regions with a common function. Hebbian networks can learn the associations between tens of marks and thousands of regions. HebbPlot presents a signature as a digital image, which can be easily interpreted. Moreover, signatures produced by HebbPlot can be compared quantitatively. We validated HebbPlot in six case studies. The results of these case studies are novel or validating results already reported in the literature, indicating the accuracy of HebbPlot. Our results indicate that promoters have a directional chromatin signature; several marks tend to stretch downstream or upstream. H3K4me3 and H3K79me2 have clear directional distributions around active promoters. In addition, the signatures of high- and low-CpG promoters are different; H3K4me3, H3K9ac, and H3K27ac are the most different marks. When we studied the signatures of enhancers active in eight tissues, we observed that these signatures are similar, but not identical. Further, we identified some histone modifications — H3K36me3, H3K79me1, H3K79me2, and H4K8ac — that are associated with coding regions of active genes. Other marks — H4K12ac, H3K14ac, H3K27me3, and H2AK5ac — were found to be weakly associated with coding regions of inactive genes. Conclusions This study resulted in a novel software tool, HebbPlot, for learning and visualizing the chromatin signature of a genetic element. Using HebbPlot, we produced a visual catalog of the signatures of multiple genetic elements in 57 cell types available through the Roadmap Epigenomics Project. Furthermore, we made a progress toward a functional catalog consisting of 22 histone marks. In sum, HebbPlot is applicable to a wide array of studies, facilitating the deciphering of the histone code

Directory of Open Access Journals

FigShare

MeShClust: an intelligent tool for clustering DNA sequences

Author: Bao
Barash
Benjamin T James
Brian B Luczak
Burke
Cheng
Chong
Comaniciu
Comaniciu
Comin
Compeau
Costello
de
Edgar
Enright
Ester
Fu
Ghodsi
Girgis
Girgis
Girgis
Girgis
Girgis
Girgis
Gotoh
Hani Z Girgis
Hazelhurst
Li
Liao
Luczak
Manning
McCullagh
Needleman
Rousseeuw
Shimizu
Solovyov
Stano
van Dongen
Wang
Warren
Wei
Zorita
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

A survey and evaluations of histogram-based statistics in alignment-free sequence comparison

Author: Almeida
Almeida
Altschul
Benjamin T James
Blaisdell
Bonham-Carter
Borozan
Brian B Luczak
Cha
Cha
Chattopadhyay
Compeau
Costa
Costello
Cover
Dai
Deza
Ghahramani
Ghandi
Girgis
Göke
Hani Z Girgis
Haubold
Kantorovitz
Leimeister
Lippert
Liu
Moeller
Needleman
Pham
Pinello
Reinert
Ren
Rubner
Seo
Sims
Song
Steele
Vinga
Vinga
Vinga
Wei
Wu
Yano
Zhang
Zhang
Zhao
Zharkikh
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

MsDetector: toward a standard computational tool for DNA microsatellites detection

Author: Achaz
Benson
Bishop
Boeva
Caskey
Castelo
Delgrange
Du
Edgar
Ellegren
Frith
Hani Z. Girgis
Kofler
Kolpakov
Kurtz
Leclercq
Lerat
Majewski
Meloni
Merkel
Mitas
Morgulis
Mudunuri
Nabney
Pokrzywa
Rabiner
Ramchandran
Richards
Saha
Sand
Schneider
Sergey L. Sheetlin
Sharma
Sharma
Sokol
Thibodeau
Verstrepen
Warren
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Benchmarking of alignment-free sequence comparison methods

Author: Almeida Jonas S.
Bernard Guillaume
Chan Cheong Xin
Choi Jae Jin
Comin Matteo
Dencker Thomas
Girgis Hani Z.
James Benjamin T.
Karlowski Wojciech M.
Kim Sung-Hou
Lau Anna Katharina
Leimeister Chris-Andre
Morgenstern Burkhard
Röhling Sophie
Sun Fengzhu
Tang Kujin
Vinga Susana
Waterman Michael S.
Zielezinski Andrzej
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Results: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. Conclusion: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions

Crossref

HAL Descartes

Archivio istituzionale della ricerca - Università di Padova

University of Queensland eSpace

Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale

Author: A Caspi
A Merkel
A Morgulis
A Morgulis
A Roulin
AL Price
B McClintock
B McClintock
CM Bergman
D Hancks
DR Dorer
E Lerat
EA Bennett
EM McCarthy
G Achaz
G Benson
H Ellegren
H Quesneville
Hani Z. Girgis
HZ Girgis
International Human Genome Sequencing Consortium
J Jurka
J Jurka
Kazazian
KG Lim
L Eichinger
M Janicki
M Morgante
M Ouedraogo
MC Frith
MJ Gardner
O Andrieu
O Delgrange
PC Sharma
R Kolpakov
R Li
RC Edgar
RC Gonzalez
S Kurtz
S Saha
S Saha
ST Szak
T Flutre
Z Bao
Z Tu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref